ICC | DCC | p-value1 | |
|---|---|---|---|
Number of transfusions | 0.001 | ||
Mean (SD) | 2.01 (2.89) | 1.64 (2.58) | |
Minimum, Maximum | 0, 30 | 0, 24 | |
Received transfusion, n (%) | 468 (61%) | 406 (52%) | <0.001 |
1Wilcoxon rank sum test; Pearson's Chi-squared test | |||
Lecture 5
NHMRC Clinical Trials Centre, University of Sydney
gillian.heller@sydney.edu.au
\mathcal{S}=\{0,1,2,\ldots\}
Response is generally a count
Starting point: Poisson distribution
\begin{align*} y&\sim\text{PO}(\mu)\\ \mathbb{E}(y)&=\text{Var}(y)=\mu \end{align*}
Problems
Overdispersion/heavy tail \quad \text{Var}(y)>\mathbb{E}(y)
Excess/lack of zeroes
ZA = zero adjusted, ZI = zero inflated
\begin{equation*} \begin{split} \mathbb{P}(Y=y)= \begin{cases}\textcolor{red}{\pi}+(1-\pi) \mathbb{P}\left(Y_1=0\right) & \text { if } y=0 \\ (1-\pi) \mathbb{P}\left(Y_1=y\right) & \text { if } y=1,2,3, \ldots \end{cases} \end{split} \end{equation*}
\mathbb{P}(Y_1) is the probability function of the response variable without zero inflation
\mathbb{P}(Y) is that of the zero-inflated response distribution.
Assumption: a proportion of the population (\pi) generates zero with certainty, and the remaining (1-\pi) generates counts according to the assumed response distribution.
number of falls, number of blood transfusions
species abundance counts
number of wildfires
etc …
\begin{equation*} \begin{split} \mathbb{P}(Y=y)= \begin{cases}\textcolor{red}{\pi} & \text { if } y=0 \\ (1-\pi) \mathbb{P}\left(Y_2=y\right) & \text { if } y=1,2,3, \ldots \end{cases} \end{split} \end{equation*}
In ZA (hurdle) model, the zero probability can be inflated or deflated with respect to the original zero probability
In ZI model, only inflation is possible.
If you need a ZI or ZA version of a distribution that isn’t supplied by gamlss (e.g. ZIWARING), this can be user-defined.
\mathcal{S}=\{0, 1, \ldots, n\}
These generally are binomial-type responses
The Australian Placental Transfusion Study (APTS, Tarnow-Mordi et al., 2017) is an international, multicentre, randomized controlled trial of 1566 preterm infants born at 30 weeks gestation or earlier.
The trial compared
Primary outcome: death or major morbidity at 36 weeks gestation.
Here we analyse the number of blood transfusions received by the infant by 36 weeks gestation
ICC | DCC | p-value1 | |
|---|---|---|---|
Number of transfusions | 0.001 | ||
Mean (SD) | 2.01 (2.89) | 1.64 (2.58) | |
Minimum, Maximum | 0, 30 | 0, 24 | |
Received transfusion, n (%) | 468 (61%) | 406 (52%) | <0.001 |
1Wilcoxon rank sum test; Pearson's Chi-squared test | |||
APTSdata |>
dplyr::select(Treatment, transfusions, gestationalage) |>
na.omit() -> APTSsubset
m1 = gamlss(transfusions ~ Treatment + gestationalage,
sigma.fo=~Treatment + gestationalage,
nu.formula =~Treatment + gestationalage,
tau.formula = ~Treatment + gestationalage,
data=APTSsubset, n.cyc=100, trace = FALSE)
chooseDist(m1, type="counts") minimum GAIC(k= 2 ) family: ZISICHEL
minimum GAIC(k= 3.84 ) family: ZISICHEL
minimum GAIC(k= 7.35 ) family: SI
2 3.84 7.35
PO 5609.248 5614.768 5625.298
GEOM 5033.518 5039.038 5049.568
GEOMo 5033.518 5039.038 5049.568
LG NA NA NA
YULE 5634.311 5639.831 5650.361
ZIPF NA NA NA
WARING 5005.754 5016.794 5037.854
GPO 4931.551 4942.591 4963.651
DPO 5041.813 5052.853 5073.913
BNB 4918.446 4935.006 4966.596
NBF 4933.775 4950.335 4981.925
NBI 4938.126 4949.166 4970.226
NBII 4938.126 4949.166 4970.226
PIG 4930.380 4941.420 4962.480
ZIP 5234.362 5245.402 5266.462
ZIP2 5269.020 5280.060 5301.120
ZAP 5235.848 5246.888 5267.948
ZALG 5116.385 5127.425 5148.485
DEL 4940.846 4957.406 4988.996
ZAZIPF 5534.915 5545.955 5567.015
SI 4907.584 4924.144 4955.734
SICHEL 4919.541 4936.101 4967.691
ZANBI 4980.174 4996.734 5028.324
ZAPIG 9099.858 9116.418 9148.008
ZINBI 4960.485 4977.045 5008.635
ZIPIG 5553.932 5570.492 5602.082
ZINBF 4936.486 4958.566 5000.686
ZABNB NA NA NA
ZASICHEL 4893.147 4915.227 4957.347
ZINBF 4936.486 4958.566 5000.686
ZIBNB 4912.423 4934.503 4976.623
ZISICHEL 4892.522 4914.602 4956.722
The zero-inflated Sichel distribution is chosen
The Sichel is a 3-parameter discrete distribution, capable of accommodating long tails:
GAMLSS-RS iteration 1: Global Deviance = 5002.486
GAMLSS-RS iteration 2: Global Deviance = 4966.412
GAMLSS-RS iteration 3: Global Deviance = 4960.555
GAMLSS-RS iteration 4: Global Deviance = 4958.091
GAMLSS-RS iteration 5: Global Deviance = 4957.213
GAMLSS-RS iteration 6: Global Deviance = 4957.069
GAMLSS-RS iteration 7: Global Deviance = 4957.059
GAMLSS-RS iteration 8: Global Deviance = 4957.058
GAMLSS-RS iteration 9: Global Deviance = 4957.058
---------------------------------------------------
Distribution parameter: mu
Start: AIC= 4967.06
transfusions ~ gestationalage
Df AIC
+ Treatment 1 4953.9
<none> 4967.1
Step: AIC= 4953.91
transfusions ~ gestationalage + Treatment
---------------------------------------------------
Distribution parameter: sigma
Start: AIC= 4953.91
~1
Df AIC
+ gestationalage 1 4923.4
<none> 4953.9
+ Treatment 1 4955.9
Step: AIC= 4923.35
~gestationalage
Df AIC
+ Treatment 1 4923.1
<none> 4923.4
Step: AIC= 4923.13
~gestationalage + Treatment
---------------------------------------------------
Distribution parameter: nu
Start: AIC= 4923.13
~1
Df AIC
+ gestationalage 1 4901.8
<none> 4923.1
+ Treatment 1 4924.9
Step: AIC= 4901.82
~gestationalage
Df AIC
<none> 4901.8
+ Treatment 1 4906.9
---------------------------------------------------
Distribution parameter: tau
Start: AIC= 4901.82
~1
Df AIC
+ gestationalage 1 4891.0
<none> 4901.8
+ Treatment 1 4903.3
Step: AIC= 4890.95
~gestationalage
Df AIC
+ Treatment 1 4890.8
<none> 4891.0
- gestationalage 1 4901.8
Step: AIC= 4890.85
~gestationalage + Treatment
Df AIC
<none> 4890.8
- Treatment 1 4891.0
- gestationalage 1 4903.3
---------------------------------------------------
Distribution parameter: nu
Start: AIC= 4890.85
~gestationalage
Df AIC
<none> 4890.8
- gestationalage 1 4900.2
---------------------------------------------------
Distribution parameter: sigma
Start: AIC= 4890.85
~gestationalage + Treatment
Df AIC
- Treatment 1 4890.7
<none> 4890.8
- gestationalage 1 4891.6
Step: AIC= 4890.74
~gestationalage
Df AIC
<none> 4890.7
- gestationalage 1 4892.8
---------------------------------------------------
Distribution parameter: mu
Start: AIC= 4890.74
transfusions ~ gestationalage + Treatment
Df AIC
<none> 4890.7
- Treatment 1 4892.7
- gestationalage 1 5044.0
---------------------------------------------------
******************************************************************
Family: c("ZISICHEL", "Zero inflated Sichel")
Call: gamlss(formula = transfusions ~ gestationalage + Treatment,
sigma.formula = ~gestationalage, nu.formula = ~gestationalage,
tau.formula = ~gestationalage + Treatment, family = ZISICHEL,
data = APTSsubset, n.cyc = 50, trace = FALSE)
Fitting method: RS()
------------------------------------------------------------------
Mu link function: log
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.04865 0.61220 18.047 <2e-16 ***
gestationalage -0.37903 0.02355 -16.094 <2e-16 ***
TreatmentDCC -0.12442 0.06496 -1.915 0.0556 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 125.108 109.199 1.146 0.252
gestationalage -4.241 3.706 -1.144 0.253
------------------------------------------------------------------
Nu link function: identity
Nu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -27.3718 6.4285 -4.258 2.19e-05 ***
gestationalage 0.8821 0.2319 3.803 0.000148 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Tau link function: logit
Tau Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -23.5968 3.4995 -6.743 2.19e-11 ***
gestationalage 0.7861 0.1242 6.328 3.25e-10 ***
TreatmentDCC 0.7187 0.2909 2.471 0.0136 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
No. of observations in the fit: 1553
Degrees of Freedom for the fit: 10
Residual Deg. of Freedom: 1543
at cycle: 13
Global Deviance: 4870.737
AIC: 4890.737
SBC: 4944.217
******************************************************************
| Variable | Parameter | |||
|---|---|---|---|---|
| \mu | \sigma | \nu | \tau=\pi | |
| Treatment | \checkmark | \checkmark | ||
| Gestational age | \checkmark | \checkmark | \checkmark | \checkmark |
We didn’t include smooth terms for gestational age in stepwise selection
We can do this manually
mZISICHEL2 = gamlss(transfusions ~ Treatment + gestationalage,
sigma.fo=~gestationalage,
nu.formula =~ gestationalage,
tau.formula = ~Treatment + pb(gestationalage),
family=ZISICHEL,
data=APTSsubset, n.cyc=100, trace = FALSE)
LR.test(m.step, mZISICHEL2) Likelihood Ratio Test for nested GAMLSS models.
(No check whether the models are nested is performed).
Null model: deviance= 4870.737 with 10 deg. of freedom
Altenative model: deviance= 4867.573 with 11.22205 deg. of freedom
LRT = 3.163794 with 1.222045 deg. of freedom and p-value= 0.1000901
mZISICHEL3 = gamlss(transfusions ~ Treatment + pb(gestationalage),
sigma.fo=~gestationalage,
nu.formula =~ gestationalage,
tau.formula = ~Treatment + gestationalage,
family=ZISICHEL,
data=APTSsubset, n.cyc=100, trace = FALSE)
LR.test(m.step, mZISICHEL3) Likelihood Ratio Test for nested GAMLSS models.
(No check whether the models are nested is performed).
Null model: deviance= 4870.737 with 10 deg. of freedom
Altenative model: deviance= 4864.456 with 11.47356 deg. of freedom
LRT = 6.281552 with 1.473559 deg. of freedom and p-value= 0.02411701
at gestational ages 26.5 and 29 weeks (approx 1st and 3rd quartiles)
library(ggplot2)
library(distreg.vis)
library(gridExtra)
gest <- c(26.5, 29)
covariate_data <- structure(list(Treatment = structure(2:1,
levels = c("ICC", "DCC"), class = "factor"),
gestationalage = c(gest[1], gest[1])), class = "data.frame",
row.names = c("DCC", "ICC"))
pred_data <- preds(model = mZISICHEL2, newdata = covariate_data)
plot_dist(model = mZISICHEL2, pred_params = pred_data,
type = "pdf", palette = "Paired") +
lims(x=c(-0.5,10)) +
labs(title=paste0("a. Gestational age = ", gest[1], " weeks"))+
guides(fill=guide_legend(title='Treatment')) +
theme(text = element_text(size=8)) +
theme(legend.position="bottom") -> g1
covariate_data <- structure(list(Treatment = structure(2:1,
levels = c("ICC", "DCC"), class = "factor"),
gestationalage = c(gest[2], gest[2])), class = "data.frame",
row.names = c("DCC", "ICC"))
pred_data <- preds(model = mZISICHEL2, newdata = covariate_data)
plot_dist(model = mZISICHEL2, pred_params = pred_data,
type = "pdf", palette = "Paired") +
lims(x=c(-0.5,10)) +
labs(title=paste0("b. Gestational age = ", gest[2], " weeks"))+
guides(fill=guide_legend(title='Treatment')) +
theme(text = element_text(size=8)) -> g2
#extract legend
#https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)}
mylegend<-g_legend(g1)
g3 <- grid.arrange(arrangeGrob(g1 + theme(legend.position="none"),
g2 + theme(legend.position="none"),
nrow=1),
mylegend, nrow=2,heights=c(10, 1))Stadlmann & Kneib (2022)
Lecture 5